Plaque Begets Plaque, ApoB Does Not

The Statistics Cause Doubt


John Slough

Study Summary

Plaque Begets Plaque, ApoB Does Not — Soto-Mota et al., 2025

  • Trial ID: NCT05733325

  • Design: 1-year prospective cohort using coronary CT angiography (CCTA)

  • Participants: 100 lean, metabolically healthy adults on keto ≥2 yrs

    • LDL-C ≥190 mg/dL, HDL-C ≥60 mg/dL, TG ≤80 mg/dL, ApoB 185 ± 51 mg/dL

  • Findings: No link between ApoB/LDL-C (baseline/change) and plaque progression
  • Predictors: Baseline plaque (CAC, NCPV, TPS, PAV) strongly predicted progression (ΔNCPV)
  • Stats: Bayesian analysis favored no ApoB–plaque link (6–10× vs alt)

  • Conclusion: Plaque begets plaque; ApoB does not
  1. Univariable Linear Models
  2. ΔNCPV as Outcome Variable
  3. Bayesian Inference
  4. Bayesian Prior Choice
  5. Bayes Factor Interpretation
  6. Additional Concerns

Univariable Linear Models

At best: exploratory.

At worst: misleading.

1 Univariable Linear Models

“Linear models on the primary (NCPV) and secondary outcomes were univariable

Despite having multiple predictors available (age, sex, ApoB, BMI, Triglycerides, Systolic blood pressure, CAC, NCPV₀, LDL-C exposure),
each was tested separately in single-predictor regressions.

This modeling choice introduces omitted-variable bias:

“…omitting a relevant variable from a model which explains the independent and dependent variable leads to biased estimates.” - Wilms (2021)


A predictor may appear significant in a univariable model, but its effect can vanish once relevant covariates are included

This is very common, especially in observational studies, with small datasets and highly correlated predictors.

1 Univariable Linear Models

Example: ΔNCPV, ApoB and Age

They modeled:

\[ \Delta \text{NCPV} = \alpha + \beta \text{ApoB} + \varepsilon \]

But if age also predicts ΔNCPV and correlates with ApoB, then \(\beta\) is biased and partly reflects the effect of age.

A more appropriate model would be:

\[ \Delta \text{NCPV} = \alpha + \beta_1 \text{ApoB} + \beta_2 \text{Age} + \varepsilon \]

This separates the contribution of ApoB from that of age.


Univariable linear models make omitted-variable bias and likely confounding almost certain, especially in small non-randomized human data. This undermines any claim of association or non-association.

1 Univariable Linear Models

But wait:

“Estimated lifetime LDL-C exposure was only a significant predictor of final NCPV in the univariable analysis but lost significance when age was included as a covariate. Both age and lifetime LDL-C exposure lost significance when baseline CAC was included in the model.”

Additionally, Table 3 includes one multivariable regression on the primary outcome \(\Delta \text{NCPV}\):

  • \(\Delta \text{NCPV} = \alpha + \beta_1 \, \text{CAC}_{\text{bl}} + \beta_2 \, \Delta \text{ApoB} + \beta_3 \, (\text{CAC}_{\text{bl}} \times \Delta \text{ApoB}) + \varepsilon\)
  • Appears to be the only adjusted model involving \(\Delta \text{NCPV}\)

But the paper provides absolutely no explanation or rationale for this model.


So they did use three multivariable models. One on the main outcome \(\Delta \text{NCPV}\) and two in their Age Mediation Analysis on NCPV at follow-up.

They never modeled \(\Delta \text{NCPV}\) with a full set of available covariates.


This selective use raises questions.

1 Univariable Linear Models

“Neither change in ApoB…baseline ApoB, nor total LDL-C exposure… were associated with the change in noncalcified plaque volume (NCPV) or TPS. All baseline plaque metrics (coronary artery calcium, NCPV, total plaque score, and percent atheroma volume) were strongly associated with the change in NCPV.”


Stating this in the abstract based solely on univariable regressions is a textbook case of overstating conclusions based on methodologically inadequate statistical models.


At best, this reflects naïve reporting. At worst, it’s actively misleading.

ΔNCPV as Outcome Variable

Plaque begets plaque: biology or mathematical artifact?

2 ΔNCPV as Outcome Variable

“All baseline plaque metrics (coronary artery calcium, NCPV, total plaque score, and percent atheroma volume) were strongly associated with the change in NCPV.”

Change in noncalcified plaque volume \(\Delta \text{NCPV}\) was the outcome:

\[ \Delta \text{NCPV} = \text{NCPV}_{1} - \text{NCPV}_0 \]

They regressed \(\Delta \text{NCPV}\) directly on its baseline value \(\text{NCPV}_0\):

\[ \Delta \text{NCPV} = \alpha + \beta \, \text{NCPV}_0 + \varepsilon \]

But this introduces mathematical coupling, because \(\text{NCPV}_0\) appears on both sides of the equation:

\[ \text{NCPV}_{1} - \text{NCPV}_0 = \alpha + \beta \, \text{NCPV}_0 + \varepsilon \]

2 ΔNCPV as Outcome Variable

“Mathematical coupling occurs when one variable directly or indirectly contains the whole or part of another, and the two variables are then analysed using correlation or regression. As a result, the statistical procedure of testing the null hypothesis — that the coefficient of correlation or the slope of regression is zero — might no longer be appropriate.” - Tu & Gilthorpe, 2007

Regression model: \(\text{NCPV}_{1} - \text{NCPV}_0 = \alpha + \beta \, \text{NCPV}_0 + \varepsilon\)


\(\beta = \frac{\operatorname{Cov}(\Delta \text{NCPV},\ \text{NCPV}_0)}{\operatorname{Var}(\text{NCPV}_0)} = \frac{\operatorname{Cov}(\text{NCPV}_1 - \text{NCPV}_0,\ \text{NCPV}_0)}{\operatorname{Var}(\text{NCPV}_0)}\)

\(= \frac{\operatorname{Cov}(\text{NCPV}_1,\ \text{NCPV}_0) - \operatorname{Var}(\text{NCPV}_0)}{\operatorname{Var}(\text{NCPV}_0)} = \frac{\rho\, \sigma_1 \sigma_0 - \sigma_0^2}{\sigma_0^2}\)

\(= \frac{\rho\, \sigma_1 - \sigma_0}{\sigma_0} = \rho \cdot \frac{\sigma_1}{\sigma_0} - 1\)

where:

  • \(\rho = \operatorname{Cor}(\text{NCPV}_0,\ \text{NCPV}_1)\)
  • \(\sigma_0\), \(\sigma_1\) = SDs at baseline and follow-up

With baseline NCPV contributing to both predictor and outcome, the slope reflects a mix of true correlation (\(\rho\)), variability ratio (\(\sigma_1/\sigma_0\)), and structural bias from subtracting \(NCPV_0\) from both sides.

It is not a clean estimate of baseline influence.

2 ΔNCPV as Outcome Variable

From Oldham test* (1962): \(\beta > 0 \quad\text{if}\quad \rho > \frac{\sigma_0}{\sigma_1}\)

The slope depends on:

  • correlation between baseline and follow-up (\(\rho\))
  • relative spread of the two time points

Because coupling alone pushes \(\beta\) downward by 1, a positive \(\beta\) is possible only when \(\rho \cdot \frac{\sigma_1}{\sigma_0} > 1\). That inequality can be satisfied without any causal baseline effect, for example, if \(\sigma_1\) exceeds \(\sigma_0\) because of simple growth, or if measurement reliability inflates \(\rho\). A positive slope therefore does not by itself demonstrate a biological baseline influence; it merely tells us that the positive \(\rho \cdot \frac{\sigma_1}{\sigma_0}\) component has outweighed the −1 artifact.

In this study:

  • Almost all participants had increased NCPV, so \(\rho\) is plausibly high
  • if increases were heterogeneous, \(\sigma_1\) is also likely > \(\sigma_0\)

A positive slope in a mathematically coupled regression signals only that baseline–follow-up correlation and/or variance growth were strong enough to offset the unavoidable −1 artifact. Whether it reflects true biology, and whether it exaggerates or understates that biology, cannot be determined from β alone.

*Source: Oldham, 1962, J. Chronic Dis.

2 ΔNCPV as Outcome Variable

n <- 100 # Set the number of values to generate
baseline <- rnorm(n, mean = 100, sd = 10) # Create 100 random numbers centered around 100
follow_up <- rnorm(n, mean = 120, sd = 10) # Create another 100 random numbers, centered around 110  
delta <- follow_up - baseline # Subtract the first set from the second to get the difference


No true association

Association due to mathematical coupling

2 ΔNCPV as Outcome Variable

An alternative is to model NCPV at follow-up (\(\text{NCPV}_1\)) directly while adjusting for baseline NCPV. This example uses ApoB as the independent variable:

\[ \text{NCPV}_1 = \alpha + \gamma\,\text{NCPV}_0 + \beta ApoB + \varepsilon \]

This approach avoids mathematical coupling, reduces residual variance, and allows the coefficient on \(ApoB\) to reflect biological association — not algebraic structure.


You could test whether baseline NCPV predicts follow-up using a mixed-effects model with a Time × Baseline interaction:

\[ \text{NCPV}_{ij} = \alpha + \gamma\,\text{Time}_{ij} + \beta\,\text{NCPV}_{0j} + \delta\,(\text{Time}_{ij} \cdot \text{NCPV}_{0j}) + b_j + \varepsilon_{ij} \]

but the with the number of subjects this must be done carefully.

2 ΔNCPV as Outcome Variable

You cannot determine whether a positive or negative slope from ΔNCPV ∼ NCPV₀ reflects biology or math, because the math builds the relationship. To isolate biological effects, you must model follow-up directly with baseline as a covariate — not as part of the outcome.




Plaque begets plaque: biology or a mathematical artifact?


Bayesian Modeling

Unusual. Fragile. Overstated.

3 Bayesian Inference

Frequentist:

Assuming there is no true association between ApoB and ΔNCPV, how likely is it that we’d observe a slope as large (or larger) than the one we found, just by chance?

If p-value (p > 0.05) a frequentist analysis can say:

“We did not find sufficient evidence to reject the hypothesis that ApoB has no association with ΔNCPV”

It cannot say the null is likely true, or produce the probability that there is no association, just that the data were inconclusive.


Bayesian:

How well do the data fit under two competing models, one with no association (null), and one with a range of plausible effect sizes for ApoB (alternative)?

A Bayes factor (e.g., BF₁₀ = 6) allows a stronger statement:

“The observed data are 6 times more likely (moderate evidence) under the ‘no association’ model than under the alternative model that assumes some effect from ApoB (as defined by the prior).”

  • Frequentist: “No evidence of effect”
  • Bayesian: “Evidence for no effect”

3 Bayesian Inference

“Since lack of statistical significance (ie, P > 0.05) should not be interpreted as evidence in favor of the null but simply a failure to reject the null, the addition of Bayesian inference adds credence to finding that there is no association between NCPV vs LDL-C or ApoB…”

So, they turn to Bayesian inference to “support” their finding that ApoB has no association with plaque progression.


This is unusual in a non-randomized, uncontrolled, 1-year observational study on a highly restricted sample:

  • Study design not suited for strong inferences about presence or absence of associations
  • Univariable, unadjusted models reduce the credibility of any statistical conclusion
  • Bayesian inference is used to imply absence of effect, not just lack of evidence


Despite the limited model and context, they present the result as confirmatory.

3 Bayesian Inference


They are applying a stronger-sounding statistical framework onto a structurally weak analysis.

This is a misuse of Bayesian inference.


Not because Bayesian methods are invalid.


Because they’re being used to amplify certainty in an analysis that lacks adjustment, control, or transparency about its assumptions.

4 Bayesian Prior Choice

“Bayes factors were calculated using BayesFactor::regressionBF… and an ~ rscale value of 0.8 to contrast a moderately informative prior with a conservative distribution width (to allow for potential large effect sizes) due to the well-documented association between ApoB changes and coronary plaque changes”


Bayesian Prior: represents your belief about likely effect sizes before seeing the data.

From the BayesFactor documentation for the parameter rscaleCont:

“Several named values are recognized: ‘medium’, ‘wide’, and ‘ultrawide’, which correspond to rscales of √2/4, 1/2, and √2/2, respectively.”

  • “medium” → rscale = √2 / 4 ≈ 0.354
  • “wide” → rscale = 0.5
  • “ultrawide” → rscale = √2 / 2 ≈ 0.707

rscale of 0.8 is wider than “ultrawide”. It is not a “moderately informative” prior. It’s actually a weakly informative or vague prior, placing most of its weight on large effects.

A moderately informative prior would typically correspond to “medium” (≈ 0.354) or “wide” (0.5), which place more mass on smaller effects.

4 Bayesian Prior Choice

The authors’ prior choice isn’t wrong, but their description of it is misleading.

Labeling an r = 0.8 prior as “moderately informative” or “conservative” downplays the fact that it assumes large effects, making small observed effects look unlikely under H₁ and inflating support for H₀.


Their choice of prior is subjective, influential, and not tested for robustness.

  • The model was set up to expect large ApoB effects, so small observed effects are treated as evidence for no effect.

  • Best practice is to run a sensitivity analysis, to see whether conclusions change with different priors.

4 Bayesian Prior Choice

Prior Scale Sensitivity Analysis on ΔNPCV ~ ApoB Model


Bayes factor sensitivity analysis
rscale BF₁₀ BF₀₁
0.100 0.530 1.889
0.250 0.288 3.473
0.350 0.217 4.608
0.500 0.157 6.363
0.707 0.113 8.834
0.800 0.100 9.954
1.000 0.081 12.374

This kind of rscale sensitivity analysis is standard for default Bayes factors, but it’s a limited diagnostic — it tests only prior width, not prior plausibility or model fit.

5 Bayes Factor Interpretation

“In other words, these data suggest it is 6 to 10 times more likely that the hypothesis of no association between these variables (the null) is true as compared to the alternative.”

A Bayes factor of 6–10 means the data are 6–10× more likely under the null model than under the alternative model, not that the null hypothesis is 6–10× more likely to be true.

They could have said: “The data are 6–10 times more likely under the no-association model than under the alternative.”


Bayes factors update prior odds into posterior odds. Claiming the null is 6–10× more likely assumes equal prior odds, something the authors never stated.

Source: Bayes Factors – Kass Raferty, 1995

5 Bayes Factor Interpretation

Even without other issues (e.g. confounding / non-adjusted variables / short follow-up / non-RCT, etc.), the reported BF of 6.3 for ΔNCPV ~ ApoB reflects only moderate evidence for no effect — not strong or decisive.

Table 1. A heuristic classification scheme for Bayes factors BF10 Source: SpringerLink

(Assuming BF₁₀ as per standard conventions; if BF₀₁, it reverses)

Summary

  1. Univariable regressions
    Each predictor tested separately (ApoB, LDL-C, age…)
    No confounder adjustment → biased, unreliable, low credibility estimates
  1. ΔNCPV as outcome
    Regressed (NCPV₁ − NCPV₀) on baseline NCPV₀
    Mathematical coupling → regression slope reflects algebra, not just biology
  1. Bayesian inference
    Bayes inference used to support “no ApoB effect” & “plaque begets plaque”
    Unadjusted, observational data → misleading, unusual use of Bayesian inference
  1. Prior choice (rscale = 0.8)
    Prior assumes large effects
    No sensitivity analysis → results likely prior-driven
  1. Bayes factor interpretation
    Claimed null is “6–10× more likely”
    Bayes factor misstated as posterior probability → compares model fit, not truth
  1. Headline claim
    “Plaque Begets Plaque, ApoB Does Not”
    Overstates evidence → statistically inadequate, selectively modeled, fragile analysis

Additional Concerns

No Adjustment for Multiple Comparisons

At least 14 distinct linear regressions were reported, with likely many more from exploratory models referenced in figures and supplements.

Numerous tests increase false positive risk, yet no multiple testing correction was applied.

Such as:

  • Bonferroni: α = 0.05 → α′ ≈ 0.005
    Baseline CAC (P < 0.001) would survive; others might not
  • Benjamini–Hochberg FDR maintains power with correlated tests

Perhaps the authors viewed this as exploratory, where correction is often skipped —
but then why title the paper “Plaque Begets Plaque, ApoB Does Not”?

Zero-inflation, censoring, heteroscedasticity

Baseline NCPV median = 44 mm³; TPS median = 0 → ≥50% of values are zero
CCTA cannot report negative plaque → both outcomes are left-censored at 0

This affects not just modeling but measurement:
When true plaque ≈ 0, error is asymmetric — it can only overestimate.

ΔNCPV, their primary outcome, is a change score between two bounded, skewed measures.
Likely to produce non-normal residuals and heteroscedasticity (e.g., larger spread at higher baseline).

If smaller baseline values were also linked to larger increases,
this may reflect the effects of left-censoring and error asymmetry — not true biological acceleration.

These issues are clear in TPS, and may affect NCPV, but diagnostics are not shown.

OLS assumes homoscedastic, normal residuals
performance::check_model() was run — but no output provided

They could have considered methods to address this such as: Tobit regression, log-transform, or robust SEs

Sensitivity Analysis

Study abstract: “Plaque progression predictors were assessed with linear regression and Bayes factors. Diet adherence and baseline cardiovascular disease risk sensitivity analyses were performed.”
“Sensitivity analyses on participants with >80% of bHB measurements above 0.3 mmol/L (Supplemental Tables 2 to 4) and with high calculated 10-year cardiovascular risk showed similar results to those just reported.”

The authors conducted an apparent post hoc “sensitivity analyses” in two subgroups:

  • High adherence (\(n = 56\)): >80% of \(\beta\)-hydroxybutyrate values ≥ 0.3 mmol/L
  • High CVD risk (\(n = 28\)): MESA 10-year risk >5%


The subgroups may partially overlap — the paper doesn’t say.
Without clarity, it’s unclear if results reflect replication or redundancy.


The same individuals may be contributing to both sets of sensitivity analyses.

This limits interpretability and weakens claims of consistency across groups.

Sensitivity Analysis

Exact same modeling strategy:

  • Univariable linear regressions
  • No covariate adjustment or interaction testing on main outcomes reported (except one model)
  • No rationale for subgroup thresholds

Results: “similar results to those just reported”. But these are smaller samples with more noise.

Model Table 3
(Full, n = 100)
Supplemental Table 4
(High Adherence, n = 56)
Supplemental Table 5
(High CVD Risk, n = 28)
ΔNCPV ~ ΔApoB β = 0.01
P = 0.91
BF > 10.0
β = 0.04
P = 0.63
BF = 6.90
β = 0.10
P = 0.57
BF = 4.76
ΔNCPV ~ ApoB₀ β = 0.06
P = 0.33
BF = 6.3
β = 0.06
P = 0.09
BF = 1.83
β = 0.11
P = 0.52
BF = 4.57

Compared to the full sample, the high adherence group offers only anecdotal evidence for the null.

Labeling this “sensitivity analysis” suggests robustness, but nothing was done to test robustness statistically.

Possibly Underpowered Study

  • Study-registered Primary endpoint: %Δ NCPV over 12 mo, not specifically sized for ApoB detection.
  • With n = 100, 80% power is reached for large and medium effects for 1 predictor linear regression.
  • A null result after one year may be due to low power, not proof of no ApoB effect.
  • Note: even if it is adequately powered, it doesn’t negate all the other issues like mathematical coupling, confounding, non-adjustment, short follow-up etc.

On the Heterogeneity of LMHR

“It should be emphasized that this includes heterogeneity in progression (and regression) across the population.” - Keta-CTA study

“If, despite our results show that CVD risk among LMHRs is heterogeneous (and thus, a pooled summary isn’t a good idea), you must have a numerical pooled NCPVchange value, it is: p50=18.8 mm3 IQR(37.3).” - X post from Author


The group is heterogeneous (to downplay the pooled NCPV change), yet they ran univariable regressions and interpreted Bayes factors on that group.


If a group is too heterogeneous to report pooled outcomes, it is also too heterogeneous for pooled inferences about predictors or mechanisms.

If their CVD risk (e.g., plaque progression) is not coherent, then the category fails as a predictive or explanatory label.

All clinical populations are heterogeneous in risk.

On the Heterogeneity of LMHR

“p50=18.8 mm3 IQR(37.3).”

With a median of 18.8 mm³ and only 1–2 individuals showing regression, the IQR of 37.3 mm³ must be mostly skewed upward, not balanced.

A wide IQR here reflects high inter-individual variability in outcomes among the LMHRs, variability in progression of NCPV, as all but a few individuals had more plaque at follow up.

In other words: With nearly all LMHRs showing plaque progression, a wide IQR (37.3 mm³) doesn’t indicate balanced variability—it reflects differing degrees of worsening.


This matters because:

  • The outcome (ΔNCPV) shows high variability across individuals in the LMHR group.
  • That inflates standard errors, weakens statistical power, and makes small observed effects more ambiguous.
  • It calls for adjusted modeling to reduce noise and account for confounding, not pooled univariable regressions and Bayes factors interpreted as strong evidence.

Notes on Letters to the Editor

Letter to the Editor - External researchers raise concerns about the study’s methodology

Response to the Letter - Study authors respond to concerns

From the response:

“Regarding the analytical points brought forward, we are aware of the relevance of linear assumptions to obtain accurate estimators. Since residual plot evaluation can also be subjective…”

Objective, quantitative statistical tests also exist for assessing model assumptions:

  • Normality: Shapiro-Wilk, Kolmogorov–Smirnov, Q-Q plots (visual / quantitative)
  • Homoscedasticity: Breusch–Pagan, White test
  • Influence / leverage: Cook’s distance, leverage plots (visual based on quantitative values)

The performance::check_model() function they cite automatically generates most of these checks.

It’s unclear why they wouldn’t include or reference the output.

“…we followed their suggestion and re-ran all models with robust linear regression…as expected, there were small differences with the published estimates, all models using robust regression were consistent with what was reported.”

No output, diagnostics, or model fit provided. We are asked to trust their assertion.

Notes on Letters to the Editor

“We agree that being able to identify patients with rapid plaque progression, and gaining better understanding of the mechanisms that mediate its pace (i.e. insulin resistance, inflammation, different dietary composition elements, etc.) is paramount. We plan to address these risk factors in future reports”

Are they admitting their analysis lacks adjustment for likely confounders, despite drawing strong conclusions from univariable regressions?

Just run and report the multivariable, adjusted models.


“Moreover, our results are compatible with a causal role of ApoB in atherosclerosis, as we have openly acknowledged and supported in previous publications.”


“Plaque Begets Plaque, ApoB does not”

Notes on Letters to the Editor

“Along the same lines, we would like to clarify that our title was not meant to be a statement about causality. “Plaque begets plaque” (which, of course, mirrors the proverb “Money begets money”) is frequently used to highlight the strong and clinically relevant association of baseline plaque values with plaque progression rate [7].

That journal citation [7] used a random-effects repeated-measures model, a type of longitudinal multivariable regression that accounts for the non-independence of repeated observations within individuals, and controlled for baseline calcium score and traditional risk factors.

This approach was chosen after detecting heteroskedasticity in a preliminary multivariable linear regression. Both models were multivariable (adjusted), not univariable.


“In retrospect, we might have chosen “Longitudinal Data from the KETO-CTA Study” as alternative phrasing to avoid misinterpretations.

Notes on Letters to the Editor

“misinterpretations”

Plaque Begets Plaque, ApoB does not